XGBoost explainability

This Python notebook compares the built-in explainability features of XGBoost with the one provided by LIME. The outline of this notebook is as follows

Brief introduction and references about Shapley values & SHAP
Example comparing XGBoost feature contribution calculations before and after SHAP
Comparison between XGBoost, XGBoost with SHAP and LIME
XGBoost model explaining capabilities

Setup

Imports



In [1]:

    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.preprocessing import LabelBinarizer, StandardScaler, LabelEncoder
import xgboost as xgb
import lime.lime_tabular
from graphviz import Digraph

Initializations



In [2]:

    
np.random.seed(1)
%matplotlib inline

Patch XGBoost tree visualization to include leaf index



In [3]:

    
def _parse_node_patched_with_leaf_index(graph, text):
    """parse dumped node"""
    match = xgb.plotting._NODEPAT.match(text)
    if match is not None:
        node = match.group(1)
        graph.node(node, label=match.group(2), shape='circle')
        return node
    match = xgb.plotting._LEAFPAT.match(text)
    if match is not None:
        node = match.group(1)
        graph.node(node, label=match.group(2) +
                   '\nleaf index: ' + node, shape='box')
        return node
    raise ValueError('Unable to parse node: {0}'.format(text))


old_parse_node = xgb.plotting._parse_node
xgb.plotting._parse_node = _parse_node_patched_with_leaf_index

Shapley Values & SHAP (SHapley Additive exPlanation)

The Shapley value is a solution concept in cooperative game theory.

SHAP values are defined as the Shapley values of a conditional expectation function and are introduced in the publication A unified approach to interpreting model predictions.

Sickness score example

The sickness score example is based on the example given in section two of the Tree SHAP publication.
A sickness score is calculated based on the binary features fever and cough.
Afterwards the feature contribution for a given prediction is determined.



In [4]:

    
columns_ex = ['fever', 'cough', 'bias']


def plot_sickness_graph(label, contributions, cough_yes_leaves=[0, 100]):
    def format_contributions(contribs):
        zipped = zip(columns_ex, contribs[0])

        def combine_contrib(tup): return ": ".join(str(i) for i in tup)
        return '\n'.join(map(combine_contrib, zipped))
    dot = Digraph()
    dot.attr('graph', label=label)
    dot.attr('graph', labelloc='t')
    dot.attr('graph', rank='LR')
    dot.attr('node', shape='box')
    dot.node('R', 'Cough')
    dot.node('F1', 'Fever')
    dot.node('F2', 'Fever')
    dot.node('L1', '0', shape='plaintext')
    dot.node('L2', '0', shape='plaintext')
    dot.node('L3', str(cough_yes_leaves[0]), shape='plaintext')
    dot.node('L4', str(cough_yes_leaves[1]), shape='plaintext')
    dot.edge('R', 'F1', label='No')
    dot.edge('R', 'F2', label='Yes')
    dot.edge('F1', 'L1', label='No')
    dot.edge('F1', 'L2', label='Yes')
    dot.edge('F2', 'L3', label='No')
    dot.edge('F2', 'L4', label='Yes')
    dot.node('Feature contributions\n' +
             format_contributions(contributions), shape='plaintext')
    return dot

Contribution calculation before Tree SHAP

The following two instances of the sickness score example illustrate the contribution values calculated before XGBoost was extended with Tree SHAP, ie. before the pull request #2438 was merged and before commit 78c4188cec31425f708d238160ea3afb67a7250a.



In [5]:

    
contributions_and = [[39.33431244, 35.66350174, 24.99929047]]
plot_sickness_graph(
    'Sickness score example #1 (simple AND)', contributions_and)









    Out[5]:



In [6]:

    
contributions = np.array([[49.9985466, 29.99912071, 29.99913406]])
plot_sickness_graph(
    'Sickness score example #2 (with increased impact of "cough")', contributions, [10, 110])









    Out[6]:



In [7]:

    
contribution_shift_pre_shap = pd.DataFrame(
    contributions - contributions_and, columns=columns_ex)
contribution_shift_pre_shap

The next two instances correspond to the previous ones but the contributions are calculated with Tree SHAP

Both attributes now share the same weight
The weight of cough increases in the second instance



In [8]:

    
data = np.array([[0, 0, 0],
                 [1, 0, 0],
                 [0, 1, 0],
                 [1, 1, 100]]).repeat([100, 100, 100, 100], axis=0)
sickness = pd.DataFrame(data, columns=['fever', 'cough', 'sickness'])
bst_test = xgb.XGBRegressor(objective='reg:linear')
bst_test.fit(sickness.iloc[:, :-1].values, sickness.iloc[:, -1])
dmatrix = xgb.DMatrix(np.array([[1, 1]]))
contributions_and_new = bst_test.get_booster().predict(dmatrix, pred_contribs=True)
plot_sickness_graph('Sickness score example #1 (simple AND)',
                    contributions_and_new)









    Out[8]:



In [9]:

    
data = np.array([[0, 0, 0],
                 [1, 0, 0],
                 [0, 1, 10],
                 [1, 1, 110]]).repeat([100, 100, 100, 100], axis=0)
sickness = pd.DataFrame(data, columns=['fever', 'cough', 'sickness'])
bst_test.fit(sickness.iloc[:, :-1].values, sickness.iloc[:, -1])
dmatrix = xgb.DMatrix(np.array([[1, 1]]))
contributions_new = bst_test.get_booster().predict(dmatrix, pred_contribs=True)
plot_sickness_graph(
    'Sickness score example #2 (with increased impact of "cough")', contributions_new, [10, 110])









    Out[9]:



In [10]:

    
contribution_shift_post_shap = pd.DataFrame(
    contributions_new - contributions_and_new, columns=columns_ex)
contribution_shift_post_shap

Comparison between XGBoost, XGBoost with SHAP and LIME

The dataset being used is the Adult Census Data Set, which has a significant amount of instances and attributes and contains continuous as well as categorical attributes.



In [11]:

    
column_names = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status", "Occupation",
                "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss", "Hours per week", "Country", "Label"]
df = pd.read_csv('data/adult.data', names=column_names, index_col=False)



In [12]:

    
df









    Out[12]:







  
    
      
      Age
      Workclass
      fnlwgt
      Education
      Education-Num
      Marital Status
      Occupation
      Relationship
      Race
      Sex
      Capital Gain
      Capital Loss
      Hours per week
      Country
      Label
    
  
  
    
      0
      39
      State-gov
      77516
      Bachelors
      13
      Never-married
      Adm-clerical
      Not-in-family
      White
      Male
      2174
      0
      40
      United-States
      <=50K
    
    
      1
      50
      Self-emp-not-inc
      83311
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      13
      United-States
      <=50K
    
    
      2
      38
      Private
      215646
      HS-grad
      9
      Divorced
      Handlers-cleaners
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      3
      53
      Private
      234721
      11th
      7
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      4
      28
      Private
      338409
      Bachelors
      13
      Married-civ-spouse
      Prof-specialty
      Wife
      Black
      Female
      0
      0
      40
      Cuba
      <=50K
    
    
      5
      37
      Private
      284582
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      6
      49
      Private
      160187
      9th
      5
      Married-spouse-absent
      Other-service
      Not-in-family
      Black
      Female
      0
      0
      16
      Jamaica
      <=50K
    
    
      7
      52
      Self-emp-not-inc
      209642
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      45
      United-States
      >50K
    
    
      8
      31
      Private
      45781
      Masters
      14
      Never-married
      Prof-specialty
      Not-in-family
      White
      Female
      14084
      0
      50
      United-States
      >50K
    
    
      9
      42
      Private
      159449
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      5178
      0
      40
      United-States
      >50K
    
    
      10
      37
      Private
      280464
      Some-college
      10
      Married-civ-spouse
      Exec-managerial
      Husband
      Black
      Male
      0
      0
      80
      United-States
      >50K
    
    
      11
      30
      State-gov
      141297
      Bachelors
      13
      Married-civ-spouse
      Prof-specialty
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      40
      India
      >50K
    
    
      12
      23
      Private
      122272
      Bachelors
      13
      Never-married
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      13
      32
      Private
      205019
      Assoc-acdm
      12
      Never-married
      Sales
      Not-in-family
      Black
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      14
      40
      Private
      121772
      Assoc-voc
      11
      Married-civ-spouse
      Craft-repair
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      40
      ?
      >50K
    
    
      15
      34
      Private
      245487
      7th-8th
      4
      Married-civ-spouse
      Transport-moving
      Husband
      Amer-Indian-Eskimo
      Male
      0
      0
      45
      Mexico
      <=50K
    
    
      16
      25
      Self-emp-not-inc
      176756
      HS-grad
      9
      Never-married
      Farming-fishing
      Own-child
      White
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      17
      32
      Private
      186824
      HS-grad
      9
      Never-married
      Machine-op-inspct
      Unmarried
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      18
      38
      Private
      28887
      11th
      7
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      19
      43
      Self-emp-not-inc
      292175
      Masters
      14
      Divorced
      Exec-managerial
      Unmarried
      White
      Female
      0
      0
      45
      United-States
      >50K
    
    
      20
      40
      Private
      193524
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      21
      54
      Private
      302146
      HS-grad
      9
      Separated
      Other-service
      Unmarried
      Black
      Female
      0
      0
      20
      United-States
      <=50K
    
    
      22
      35
      Federal-gov
      76845
      9th
      5
      Married-civ-spouse
      Farming-fishing
      Husband
      Black
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      23
      43
      Private
      117037
      11th
      7
      Married-civ-spouse
      Transport-moving
      Husband
      White
      Male
      0
      2042
      40
      United-States
      <=50K
    
    
      24
      59
      Private
      109015
      HS-grad
      9
      Divorced
      Tech-support
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      25
      56
      Local-gov
      216851
      Bachelors
      13
      Married-civ-spouse
      Tech-support
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      26
      19
      Private
      168294
      HS-grad
      9
      Never-married
      Craft-repair
      Own-child
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      27
      54
      ?
      180211
      Some-college
      10
      Married-civ-spouse
      ?
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      60
      South
      >50K
    
    
      28
      39
      Private
      367260
      HS-grad
      9
      Divorced
      Exec-managerial
      Not-in-family
      White
      Male
      0
      0
      80
      United-States
      <=50K
    
    
      29
      49
      Private
      193366
      HS-grad
      9
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      32531
      30
      ?
      33811
      Bachelors
      13
      Never-married
      ?
      Not-in-family
      Asian-Pac-Islander
      Female
      0
      0
      99
      United-States
      <=50K
    
    
      32532
      34
      Private
      204461
      Doctorate
      16
      Married-civ-spouse
      Prof-specialty
      Husband
      White
      Male
      0
      0
      60
      United-States
      >50K
    
    
      32533
      54
      Private
      337992
      Bachelors
      13
      Married-civ-spouse
      Exec-managerial
      Husband
      Asian-Pac-Islander
      Male
      0
      0
      50
      Japan
      >50K
    
    
      32534
      37
      Private
      179137
      Some-college
      10
      Divorced
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      39
      United-States
      <=50K
    
    
      32535
      22
      Private
      325033
      12th
      8
      Never-married
      Protective-serv
      Own-child
      Black
      Male
      0
      0
      35
      United-States
      <=50K
    
    
      32536
      34
      Private
      160216
      Bachelors
      13
      Never-married
      Exec-managerial
      Not-in-family
      White
      Female
      0
      0
      55
      United-States
      >50K
    
    
      32537
      30
      Private
      345898
      HS-grad
      9
      Never-married
      Craft-repair
      Not-in-family
      Black
      Male
      0
      0
      46
      United-States
      <=50K
    
    
      32538
      38
      Private
      139180
      Bachelors
      13
      Divorced
      Prof-specialty
      Unmarried
      Black
      Female
      15020
      0
      45
      United-States
      >50K
    
    
      32539
      71
      ?
      287372
      Doctorate
      16
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      10
      United-States
      >50K
    
    
      32540
      45
      State-gov
      252208
      HS-grad
      9
      Separated
      Adm-clerical
      Own-child
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32541
      41
      ?
      202822
      HS-grad
      9
      Separated
      ?
      Not-in-family
      Black
      Female
      0
      0
      32
      United-States
      <=50K
    
    
      32542
      72
      ?
      129912
      HS-grad
      9
      Married-civ-spouse
      ?
      Husband
      White
      Male
      0
      0
      25
      United-States
      <=50K
    
    
      32543
      45
      Local-gov
      119199
      Assoc-acdm
      12
      Divorced
      Prof-specialty
      Unmarried
      White
      Female
      0
      0
      48
      United-States
      <=50K
    
    
      32544
      31
      Private
      199655
      Masters
      14
      Divorced
      Other-service
      Not-in-family
      Other
      Female
      0
      0
      30
      United-States
      <=50K
    
    
      32545
      39
      Local-gov
      111499
      Assoc-acdm
      12
      Married-civ-spouse
      Adm-clerical
      Wife
      White
      Female
      0
      0
      20
      United-States
      >50K
    
    
      32546
      37
      Private
      198216
      Assoc-acdm
      12
      Divorced
      Tech-support
      Not-in-family
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32547
      43
      Private
      260761
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Husband
      White
      Male
      0
      0
      40
      Mexico
      <=50K
    
    
      32548
      65
      Self-emp-not-inc
      99359
      Prof-school
      15
      Never-married
      Prof-specialty
      Not-in-family
      White
      Male
      1086
      0
      60
      United-States
      <=50K
    
    
      32549
      43
      State-gov
      255835
      Some-college
      10
      Divorced
      Adm-clerical
      Other-relative
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32550
      43
      Self-emp-not-inc
      27242
      Some-college
      10
      Married-civ-spouse
      Craft-repair
      Husband
      White
      Male
      0
      0
      50
      United-States
      <=50K
    
    
      32551
      32
      Private
      34066
      10th
      6
      Married-civ-spouse
      Handlers-cleaners
      Husband
      Amer-Indian-Eskimo
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32552
      43
      Private
      84661
      Assoc-voc
      11
      Married-civ-spouse
      Sales
      Husband
      White
      Male
      0
      0
      45
      United-States
      <=50K
    
    
      32553
      32
      Private
      116138
      Masters
      14
      Never-married
      Tech-support
      Not-in-family
      Asian-Pac-Islander
      Male
      0
      0
      11
      Taiwan
      <=50K
    
    
      32554
      53
      Private
      321865
      Masters
      14
      Married-civ-spouse
      Exec-managerial
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32555
      22
      Private
      310152
      Some-college
      10
      Never-married
      Protective-serv
      Not-in-family
      White
      Male
      0
      0
      40
      United-States
      <=50K
    
    
      32556
      27
      Private
      257302
      Assoc-acdm
      12
      Married-civ-spouse
      Tech-support
      Wife
      White
      Female
      0
      0
      38
      United-States
      <=50K
    
    
      32557
      40
      Private
      154374
      HS-grad
      9
      Married-civ-spouse
      Machine-op-inspct
      Husband
      White
      Male
      0
      0
      40
      United-States
      >50K
    
    
      32558
      58
      Private
      151910
      HS-grad
      9
      Widowed
      Adm-clerical
      Unmarried
      White
      Female
      0
      0
      40
      United-States
      <=50K
    
    
      32559
      22
      Private
      201490
      HS-grad
      9
      Never-married
      Adm-clerical
      Own-child
      White
      Male
      0
      0
      20
      United-States
      <=50K
    
    
      32560
      52
      Self-emp-inc
      287927
      HS-grad
      9
      Married-civ-spouse
      Exec-managerial
      Wife
      White
      Female
      15024
      0
      40
      United-States
      >50K
    
  

32561 rows × 15 columns

Map to processable features



In [13]:

    
from sklearn_pandas import DataFrameMapper, cross_val_score
label_binarizer = LabelBinarizer()
mapper = DataFrameMapper([
    (['Age'], StandardScaler()),
    ('Workclass', LabelEncoder()),
    (['fnlwgt'], StandardScaler()),
    ('Education', LabelEncoder()),
    (['Education-Num'], StandardScaler()),
    ('Marital Status', LabelEncoder()),
    ('Occupation', LabelEncoder()),
    ('Relationship', LabelEncoder()),
    ('Race', LabelEncoder()),
    ('Sex', LabelEncoder()),
    (['Capital Gain'], StandardScaler()),
    (['Capital Loss'], StandardScaler()),
    (['Hours per week'], StandardScaler()),
    ('Country', LabelEncoder()),
    ('Label', label_binarizer)
],
    df_out=True
)
fitted_df = mapper.fit_transform(df)









    



/usr/local/Anaconda3-5.0.0-Linux-x86_64/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)

Fit XGBoost classifier



In [14]:

    
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    fitted_df.iloc[:, :-1], fitted_df.iloc[:, -1], test_size=0.20)
bst = xgb.XGBClassifier(n_estimators=300, max_depth=5)
bst.fit(X_train.values, y_train.values)









    Out[14]:





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=5, min_child_weight=1, missing=None, n_estimators=300,
       n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)



In [15]:

    
from sklearn.metrics import accuracy_score
accuracy_score(y_test.values, bst.predict(X_test.values))









    Out[15]:





0.87148779364348228

Explain a specific instance



In [16]:

    
i = 1412
X_train.iloc[i, :]









    Out[16]:





Age                1.423610
Workclass          4.000000
fnlwgt             0.443566
Education         11.000000
Education-Num     -0.420060
Marital Status     2.000000
Occupation         1.000000
Relationship       0.000000
Race               4.000000
Sex                1.000000
Capital Gain      -0.145920
Capital Loss      -0.216660
Hours per week     0.369519
Country           39.000000
Name: 246, dtype: float64

XGBoost



In [17]:

    
booster = bst.get_booster()
inst = np.array([X_train.iloc[i, :].values])
prediction = bst.predict(inst)
print('predicted class (transformed to 0/1): ' + str(bst.predict(inst)[0]))
print('actual class (transformed to 0/1): ' + str(y_train[i]))
print('precicted class: {} of {}'.format(
    str(label_binarizer.classes_[int(prediction)]), label_binarizer.classes_))









    



predicted class (transformed to 0/1): 1.0
actual class (transformed to 0/1): 1.0
precicted class:  >50K of [' <=50K' ' >50K']



In [18]:

    
dtrain = xgb.DMatrix(np.array([X_train.iloc[i, :].values]))
print('Predicted probability: ' + str(booster.predict(dtrain)))
print('Predicted raw value: {}\n'.format(
    booster.predict(dtrain, output_margin=True)))
contribs = booster.predict(dtrain, pred_contribs=True)
contributions2 = zip(column_names[:-1] + ['Bias'], contribs[0])
print('Contributions (sorted by absolute value):\n' + '\n'.join(str(x)
                                                                for x in sorted(contributions2, key=lambda x: abs(x[1]), reverse=True)))
print("Sum of contributions {}".format(
    booster.predict(dtrain, pred_contribs=True).sum()))









    



Predicted probability: [ 0.55912364]
Predicted raw value: [ 0.23760611]

Contributions (sorted by absolute value):
('Bias', -1.3538995)
('Age', 0.56285393)
('Relationship', 0.51703531)
('Hours per week', 0.39134136)
('Marital Status', 0.37797219)
('Education-Num', -0.36059847)
('fnlwgt', 0.1621342)
('Capital Gain', -0.14017731)
('Sex', 0.084330857)
('Occupation', -0.067596197)
('Workclass', 0.061030753)
('Capital Loss', -0.057126872)
('Country', 0.031709559)
('Race', 0.026004098)
('Education', 0.0025927317)
Sum of contributions 0.23760664463043213



In [19]:

    
import shap
# load JS visualization code to notebook
shap.initjs()



In [20]:

    
dmatrix = xgb.DMatrix(X_train.values)
shap_values = bst.get_booster().predict(dmatrix, pred_contribs=True)
# visualize the first prediction's explaination
shap.visualize(
    shap_values[i, :], feature_names=df.columns[:-1], data=X_train.iloc[i, :].values)









    Out[20]:

LIME - Local Interpretable Model-agnostic Explanations



In [21]:

    
categorical_features_idxs = [1, 3, 5, 6, 7, 8, 9, 13]
categorical_names = {}
for feature in categorical_features_idxs:
    le = sklearn.preprocessing.LabelEncoder()
    le.fit(df.iloc[:, feature])
    categorical_names[feature] = le.classes_

explainer = lime.lime_tabular.LimeTabularExplainer(X_train.values,  # fitted_df.iloc[:,:-1].values.astype(float),
                                                   feature_names=['Age', 'Workclass', 'fnlwgt', 'Education', 'Education-Num',
                                                                  'Marital Status', 'Occupation', 'Relationship', 'Race', 'Sex',
                                                                  'Capital gain', 'Capital Loss', 'Hours per week', 'Country'],
                                                   categorical_features=categorical_features_idxs,
                                                   categorical_names=categorical_names,
                                                   class_names=label_binarizer.classes_,
                                                   kernel_width=3, verbose=False)



In [22]:

    
# Custom predict function not necessary, as all date has been transformed already
# predict_fn = lambda x: bst.predict_proba(mapper.fit_transform(x)).astype(float)
exp = explainer.explain_instance(
    X_train.iloc[i, :].values, bst.predict_proba, num_features=4)
exp.show_in_notebook(show_all=False)









    





        
        
        
Feature Value
Capital gain -0.15
Education-Num -0.42
Relationship= Husband True
Marital Status= Married-civ-spouse True

Comparision between LIME and SHAP



In [23]:

    
exp.show_in_notebook(show_all=False)
shap.visualize(
    shap_values[i, :], feature_names=df.columns[:-1], data=X_train.iloc[i, :].values)









    





        
        
        
Feature Value
Capital gain -0.15
Education-Num -0.42
Relationship= Husband True
Marital Status= Married-civ-spouse True
        
        
        






    Out[23]:

Model explanation



In [24]:

    
legend_text = """Instance:  {0}

{1}

Predicted leaf: {2}
""".format(i, X_train.iloc[i, :], booster.predict(dtrain, pred_leaf=True)[0][0])
legend_test = '''
Instance:  + str(i)
'''



In [25]:

    
%config InlineBackend.figure_format = 'retina'
plt.rc('figure', figsize=(50, 50))
fig = plt.figure()
ax = fig.add_subplot(111)
xgb.plot_tree(bst, fmap='data/adult.fmap', num_trees=0, rankdir='LR', ax=ax)
ax.text(0.0, 0.7, legend_text, bbox={
        'facecolor': 'gray', 'alpha': 0.5, 'pad': 5}, fontsize=30, transform=ax.transAxes)
plt.show()



In [26]:

    
plt.rc('figure', figsize=(15, 15))
scores = booster.get_score(fmap='data/adult.fmap', importance_type='gain')
xgb.plot_importance(scores, xlabel='Gain')









    Out[26]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1884c1be80>

	Age	Workclass	fnlwgt	Education	Education-Num	Marital Status	Occupation	Relationship	Race	Sex	Capital Gain	Capital Loss	Hours per week	Country	Label
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	0	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	0	40	Cuba	<=50K
5	37	Private	284582	Masters	14	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	0	40	United-States	<=50K
6	49	Private	160187	9th	5	Married-spouse-absent	Other-service	Not-in-family	Black	Female	0	0	16	Jamaica	<=50K
7	52	Self-emp-not-inc	209642	HS-grad	9	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	45	United-States	>50K
8	31	Private	45781	Masters	14	Never-married	Prof-specialty	Not-in-family	White	Female	14084	0	50	United-States	>50K
9	42	Private	159449	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	5178	0	40	United-States	>50K
10	37	Private	280464	Some-college	10	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0	0	80	United-States	>50K
11	30	State-gov	141297	Bachelors	13	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	40	India	>50K
12	23	Private	122272	Bachelors	13	Never-married	Adm-clerical	Own-child	White	Female	0	0	30	United-States	<=50K
13	32	Private	205019	Assoc-acdm	12	Never-married	Sales	Not-in-family	Black	Male	0	0	50	United-States	<=50K
14	40	Private	121772	Assoc-voc	11	Married-civ-spouse	Craft-repair	Husband	Asian-Pac-Islander	Male	0	0	40	?	>50K
15	34	Private	245487	7th-8th	4	Married-civ-spouse	Transport-moving	Husband	Amer-Indian-Eskimo	Male	0	0	45	Mexico	<=50K
16	25	Self-emp-not-inc	176756	HS-grad	9	Never-married	Farming-fishing	Own-child	White	Male	0	0	35	United-States	<=50K
17	32	Private	186824	HS-grad	9	Never-married	Machine-op-inspct	Unmarried	White	Male	0	0	40	United-States	<=50K
18	38	Private	28887	11th	7	Married-civ-spouse	Sales	Husband	White	Male	0	0	50	United-States	<=50K
19	43	Self-emp-not-inc	292175	Masters	14	Divorced	Exec-managerial	Unmarried	White	Female	0	0	45	United-States	>50K
20	40	Private	193524	Doctorate	16	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	60	United-States	>50K
21	54	Private	302146	HS-grad	9	Separated	Other-service	Unmarried	Black	Female	0	0	20	United-States	<=50K
22	35	Federal-gov	76845	9th	5	Married-civ-spouse	Farming-fishing	Husband	Black	Male	0	0	40	United-States	<=50K
23	43	Private	117037	11th	7	Married-civ-spouse	Transport-moving	Husband	White	Male	0	2042	40	United-States	<=50K
24	59	Private	109015	HS-grad	9	Divorced	Tech-support	Unmarried	White	Female	0	0	40	United-States	<=50K
25	56	Local-gov	216851	Bachelors	13	Married-civ-spouse	Tech-support	Husband	White	Male	0	0	40	United-States	>50K
26	19	Private	168294	HS-grad	9	Never-married	Craft-repair	Own-child	White	Male	0	0	40	United-States	<=50K
27	54	?	180211	Some-college	10	Married-civ-spouse	?	Husband	Asian-Pac-Islander	Male	0	0	60	South	>50K
28	39	Private	367260	HS-grad	9	Divorced	Exec-managerial	Not-in-family	White	Male	0	0	80	United-States	<=50K
29	49	Private	193366	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	40	United-States	<=50K
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32531	30	?	33811	Bachelors	13	Never-married	?	Not-in-family	Asian-Pac-Islander	Female	0	0	99	United-States	<=50K
32532	34	Private	204461	Doctorate	16	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	60	United-States	>50K
32533	54	Private	337992	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	Asian-Pac-Islander	Male	0	0	50	Japan	>50K
32534	37	Private	179137	Some-college	10	Divorced	Adm-clerical	Unmarried	White	Female	0	0	39	United-States	<=50K
32535	22	Private	325033	12th	8	Never-married	Protective-serv	Own-child	Black	Male	0	0	35	United-States	<=50K
32536	34	Private	160216	Bachelors	13	Never-married	Exec-managerial	Not-in-family	White	Female	0	0	55	United-States	>50K
32537	30	Private	345898	HS-grad	9	Never-married	Craft-repair	Not-in-family	Black	Male	0	0	46	United-States	<=50K
32538	38	Private	139180	Bachelors	13	Divorced	Prof-specialty	Unmarried	Black	Female	15020	0	45	United-States	>50K
32539	71	?	287372	Doctorate	16	Married-civ-spouse	?	Husband	White	Male	0	0	10	United-States	>50K
32540	45	State-gov	252208	HS-grad	9	Separated	Adm-clerical	Own-child	White	Female	0	0	40	United-States	<=50K
32541	41	?	202822	HS-grad	9	Separated	?	Not-in-family	Black	Female	0	0	32	United-States	<=50K
32542	72	?	129912	HS-grad	9	Married-civ-spouse	?	Husband	White	Male	0	0	25	United-States	<=50K
32543	45	Local-gov	119199	Assoc-acdm	12	Divorced	Prof-specialty	Unmarried	White	Female	0	0	48	United-States	<=50K
32544	31	Private	199655	Masters	14	Divorced	Other-service	Not-in-family	Other	Female	0	0	30	United-States	<=50K
32545	39	Local-gov	111499	Assoc-acdm	12	Married-civ-spouse	Adm-clerical	Wife	White	Female	0	0	20	United-States	>50K
32546	37	Private	198216	Assoc-acdm	12	Divorced	Tech-support	Not-in-family	White	Female	0	0	40	United-States	<=50K
32547	43	Private	260761	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	Mexico	<=50K
32548	65	Self-emp-not-inc	99359	Prof-school	15	Never-married	Prof-specialty	Not-in-family	White	Male	1086	0	60	United-States	<=50K
32549	43	State-gov	255835	Some-college	10	Divorced	Adm-clerical	Other-relative	White	Female	0	0	40	United-States	<=50K
32550	43	Self-emp-not-inc	27242	Some-college	10	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	50	United-States	<=50K
32551	32	Private	34066	10th	6	Married-civ-spouse	Handlers-cleaners	Husband	Amer-Indian-Eskimo	Male	0	0	40	United-States	<=50K
32552	43	Private	84661	Assoc-voc	11	Married-civ-spouse	Sales	Husband	White	Male	0	0	45	United-States	<=50K
32553	32	Private	116138	Masters	14	Never-married	Tech-support	Not-in-family	Asian-Pac-Islander	Male	0	0	11	Taiwan	<=50K
32554	53	Private	321865	Masters	14	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	0	40	United-States	>50K
32555	22	Private	310152	Some-college	10	Never-married	Protective-serv	Not-in-family	White	Male	0	0	40	United-States	<=50K
32556	27	Private	257302	Assoc-acdm	12	Married-civ-spouse	Tech-support	Wife	White	Female	0	0	38	United-States	<=50K
32557	40	Private	154374	HS-grad	9	Married-civ-spouse	Machine-op-inspct	Husband	White	Male	0	0	40	United-States	>50K
32558	58	Private	151910	HS-grad	9	Widowed	Adm-clerical	Unmarried	White	Female	0	0	40	United-States	<=50K
32559	22	Private	201490	HS-grad	9	Never-married	Adm-clerical	Own-child	White	Male	0	0	20	United-States	<=50K
32560	52	Self-emp-inc	287927	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	15024	0	40	United-States	>50K

Feature	Value
Capital gain	-0.15
Education-Num	-0.42
Relationship= Husband	True
Marital Status= Married-civ-spouse	True